The simplest Haskell program is main = pure ()
. It does nothing. It can be compiled to an executable file with GHC – for example, with GHC 9.12.2, by commanding:
1 |
> stack --snapshot ghc-9.12.2 ghc -- Main.hs |
This command will output Main.hi
, Main.o
and Main.exe
– the last being the executable file. We can find its size in bytes by commanding (in PowerShell):
1 2 |
> (dir Main.exe).Length 10150400 |
About 10.15 million bytes is not small, for an executable that does nothing.
Linking
By making GHC verbose, with its -v
flag, and looking for -l
options during the linking step, it is possible to identify what is linked in producing Main.exe
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
> stack --snapshot ghc-9.12.2 ghc -- -v Main.hs HSbase-4.21.0.0-55e3 HSghc-internal-9.1202.0-cfbf HSghc-bignum-1.3-89fe HSghc-prim-0.13.0-869e HSrts-1.0.2 Cffi-6 -- Statically linked wsock32 -- Dynamically linked user32 -- Dynamically linked shell32 -- Dynamically linked mingw32 -- Statically linked kernel32 -- Dynamically linked advapi32 -- Dynamically linked, if needed (not needed, directly) mingwex -- Statically linked ws2_32 -- Dynamically linked shlwapi -- Dynamically linked, if needed (not needed, directly) ole32 -- Dynamically linked, if needed (not needed, directly) rpcrt4 -- Dynamically linked, if needed (not needed, directly) ntdll -- Dynamically linked, ntdll.dll user32 -- As above mingw32 -- As above mingwex -- As above ucrt -- Dynamically linked, if needed (not needed, directly) m -- Statically linked wsock32 -- As above gdi32 -- Dynamically linked, if needed (not needed, directly) winmm -- Dynamically linked dbghelp -- Dynamically linked psapi -- Dynamically linked, if needed (not needed, directly) pthread -- Statically linked |
On Windows, Stack comes with certain tools, provided by the MSYS2 project. They include the Cygwin project’s ldd.exe
tool:
1 2 3 4 5 6 7 8 9 |
> stack --snapshot ghc-9.12.2 exec -- where.exe ldd D:\sr\programs\x86_64-windows\msys2-20240727\usr\bin\ldd.exe > stack --snapshot ghc-9.12.2 exec -- ldd --version ldd (cygwin) 3.5.3 Print shared library dependencies Copyright (C) 2009 - 2024 Chris Faylor This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
We can identify what is actually dynamically loaded by commanding:
1 |
> stack --snapshot ghc-9.12.2 exec -- ldd Main.exe |
and examining the results.
Microsoft also provides tool dumpbin.exe
(the Microsoft COFF Binary File Dumper) as part of Build Tools for Visual Studio 2022. We can identify the DLLs that are actually imported to Main.exe
using the command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
> &"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.44.35207\bin\Hostx86\x86\dumpbin.exe" /DEPENDENTS Main.exe Microsoft (R) COFF/PE Dumper Version 14.44.35217.0 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file Main.exe File Type: EXECUTABLE IMAGE Image has the following dependencies: KERNEL32.dll api-ms-win-crt-heap-l1-1-0.dll api-ms-win-crt-private-l1-1-0.dll api-ms-win-crt-runtime-l1-1-0.dll api-ms-win-crt-stdio-l1-1-0.dll api-ms-win-crt-string-l1-1-0.dll SHELL32.dll api-ms-win-crt-environment-l1-1-0.dll api-ms-win-crt-convert-l1-1-0.dll api-ms-win-crt-locale-l1-1-0.dll api-ms-win-crt-math-l1-1-0.dll api-ms-win-crt-time-l1-1-0.dll api-ms-win-crt-filesystem-l1-1-0.dll USER32.dll dbghelp.dll api-ms-win-crt-utility-l1-1-0.dll ntdll.dll WSOCK32.dll WS2_32.dll WINMM.dll LINK : warning LNK4078: multiple '.rdata' sections found with different attributes (C0000040) Summary 1000 .buildid 154000 .data 5000 .debug_abbrev 1000 .debug_aranges 1000 .debug_frame 14000 .debug_info 11000 .debug_line 25000 .debug_loc 4000 .debug_ranges A000 .debug_str 3000 .pdata 13000 .rdata 3B000 .rdata 40000 .reloc 1000 .rsrc 4A6000 .text 1000 .tls |
They are reflected in the comments above.
Executable files
On Windows, valid executable files are in the Portable Executable (PE) format. The PE format extends the Common Object File Format (COFF).
A COFF header specifies the number of sections and is followed by the sections table (a sequence of section headers). A section header uses 32 bits for flags that specify the characteristics of the section, as set out in the table below.
Value (hexadecimal) | Flag name (prefixed by IMAGE_SCN_) | Description |
---|---|---|
00000001 | TYPE_REG** | Reserved for future use. |
00000002 | TYPE_DSECT** | Reserved for future use. |
00000004 | TYPE_NO_LOAD** | Reserved for future use. |
00000008 | TYPE_NO_PAD | Obsolete.* |
00000010 | TYPE_COPY** | Reserved for future use. |
00000020 | CNT_CODE | The section contains executable code. |
00000040 | CNT_INITIALIZED_DATA | The section contains initialized data. |
00000080 | CNT_UNINITIALIZED_ DATA | The section contains uninitialized data. |
00000100 | LNK_OTHER | Reserved for future use. |
00000200 | LNK_INFO | The section contains comments or other information.* |
00000400 | TYPE_OVER** | Reserved for future use. |
00000800 | LNK_REMOVE | The section will not become part of the image.* |
00001000 | LNK_COMDAT | The section contains COMDAT data.* |
00008000 | GPREL | The section contains data referenced through the global pointer. |
00010000 | MEM_PURGEABLE | Reserved for future use. |
00020000 | MEM_16BIT | Reserved for future use. |
00040000 | MEM_LOCKED | Reserved for future use. |
00080000 | MEM_PRELOAD | Reserved for future use. |
00100000 .. 00E00000 | ALIGN_1BYTES to ALIGN_8192BYTES | Align data on a specified byte boundary.* |
01000000 | LNK_NRELOC_OVFL | The section contains extended relocations. |
02000000 | MEM_DISCARDABLE | The section can be discarded as needed. |
04000000 | MEM_NOT_CACHED | The section cannot be cached. |
08000000 | MEM_NOT_PAGED | The section is not pageable. |
10000000 | MEM_SHARED | The section can be shared in memory. |
20000000 | MEM_EXECUTE | The section can be executed as code. |
40000000 | MEM_READ | The section can be read. |
80000000 | MEM_WRITE | The section can be written to. |
* Flag is valid only for object files. ** Deduced from .NET SectionCharacteristics
enumeration documentation.
On Windows, GHC comes with certain tools. They include objdump.exe
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
> stack --snapshot ghc-9.12.2 exec -- where.exe objdump D:\sr\programs\x86_64-windows\ghc-9.12.2\mingw\bin\objdump.exe > stack --snapshot ghc-9.12.2 exec -- objdump --version LLVM (http://llvm.org/): LLVM version 14.0.6 Optimized build. Default target: x86_64-w64-windows-gnu Host CPU: goldmont Registered Targets: aarch64 - AArch64 (little endian) aarch64_32 - AArch64 (little endian ILP32) aarch64_be - AArch64 (big endian) amdgcn - AMD GCN GPUs arm64 - ARM64 (little endian) arm64_32 - ARM64 (little endian ILP32) r600 - AMD GPUs HD2XXX-HD6XXX wasm32 - WebAssembly 32-bit wasm64 - WebAssembly 64-bit x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 |
This is the LLVM tool llvm-objdump
. The -h
flag displays summaries of the headers for each section. We can command:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
> stack --snapshot ghc-9.12.2 exec -- objdump -h Main.exe Main.exe: file format coff-x86-64 Sections: Idx Name Size VMA Type 0 .text 004a59c6 0000000140001000 TEXT 1 .rdata 00012258 00000001404a7000 DATA 2 .buildid 00000035 00000001404ba000 DATA 3 .data 00146200 00000001404bb000 DATA 4 .pdata 00002fe8 000000014060f000 DATA 5 .rdata 0003abb8 0000000140612000 DATA 6 .tls 00000010 000000014064d000 DATA 7 .rsrc 00000258 000000014064e000 DATA 8 .reloc 0003fe04 000000014064f000 DATA 9 .debug_abbrev 00004bd5 000000014068f000 DATA, DEBUG 10 .debug_aranges 000001e0 0000000140694000 DATA, DEBUG 11 .debug_frame 000000f0 0000000140695000 DATA, DEBUG 12 .debug_info 00013166 0000000140696000 DATA, DEBUG 13 .debug_line 00010c4b 00000001406aa000 DATA, DEBUG 14 .debug_loc 00024452 00000001406bb000 DATA, DEBUG 15 .debug_ranges 00003d40 00000001406e0000 DATA, DEBUG 16 .debug_str 00009a27 00000001406e4000 DATA, DEBUG |
objdump
translates combinations of COFF section characteristics as a ‘type’, such as TEXT
, DATA
or DEBUG
.
The largest sections of Main.exe
are .text
(about 4.87 million bytes; 0x4a59c6
) and .data
(about 1.34 million bytes; 0x146200
).
Stripping
The tools that come with GHC on Windows include strip.exe
:
1 2 3 4 5 6 7 8 9 10 |
> stack --snapshot ghc-9.12.2 exec -- where.exe strip D:\sr\programs\x86_64-windows\ghc-9.12.2\mingw\bin\strip.exe > stack --snapshot ghc-9.12.2 exec -- strip --version llvm-strip, compatible with GNU strip LLVM (http://llvm.org/): LLVM version 14.0.6 Optimized build. Default target: x86_64-w64-windows-gnu Host CPU: goldmont |
The LLVM tool llvm-strip
is intended to work as a drop-in replacement for GNU’s strip
. Running it without options is the equivalent of running it with flag --strip-all
. For COFF files, --strip-all
removes all symbols, debug sections and relocations from the output. The tool modifies the input file in-place.
The default --enable-executable-stripping
flag of Cabal (the library) causes Cabal’s copy
and install
commands to seek run strip
on the installed executable file.
1 2 3 |
> stack --snapshot ghc-9.12.2 exec -- strip Main.exe > (dir Main.exe).Length 6801408 |
Stripping has removed all the DEBUG
sections:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
> stack --snapshot ghc-9.12.2 exec -- objdump -h Main.exe Main.exe: file format coff-x86-64 Sections: Idx Name Size VMA Type 0 .text 004a59c6 0000000140001000 TEXT 1 .rdata 00012258 00000001404a7000 DATA 2 .buildid 00000035 00000001404ba000 DATA 3 .data 00146200 00000001404bb000 DATA 4 .pdata 00002fe8 000000014060f000 DATA 5 .rdata 0003abb8 0000000140612000 DATA 6 .tls 00000010 000000014064d000 DATA 7 .rsrc 00000258 000000014064e000 DATA 8 .reloc 0003fe04 000000014064f000 DATA |
Although smaller, about 6.80 million bytes is still not small for a program that does nothing.
GHC 8.10.7
The output of GHC 8.10.7 was smaller. We can compare:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
> stack --snapshot ghc-8.10.7 exec -- objdump -h Main.exe Main.exe: file format pei-x86-64 Sections: Idx Name Size VMA LMA File off Algn 0 .text 000c9920 0000000000401000 0000000000401000 00000400 2**6 CONTENTS, ALLOC, LOAD, READONLY, CODE, DATA 1 .data 0001dc80 00000000004cb000 00000000004cb000 000c9e00 2**6 CONTENTS, ALLOC, LOAD, DATA 2 .rdata 00035c00 00000000004e9000 00000000004e9000 000e7c00 2**6 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .pdata 000045cc 000000000051f000 000000000051f000 0011d800 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .xdata 00003dd8 0000000000524000 0000000000524000 00121e00 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .bss 0000164c 0000000000528000 0000000000528000 00000000 2**5 ALLOC 6 .idata 000024f0 000000000052a000 000000000052a000 00125c00 2**2 CONTENTS, ALLOC, LOAD, DATA 7 .CRT 00000068 000000000052d000 000000000052d000 00128200 2**3 CONTENTS, ALLOC, LOAD, DATA 8 .tls 00000010 000000000052e000 000000000052e000 00128400 2**3 CONTENTS, ALLOC, LOAD, DATA 9 .rsrc 00000738 000000000052f000 000000000052f000 00128600 2**2 CONTENTS, ALLOC, LOAD, DATA |
In the case of GHC 8.10.7, objdump.exe
is the GNU project’s tool.